Search CORE

141 research outputs found

Audio-assisted movie dialogue detection

Author: Kotropoulos C.
Kotropoulos C.
Kotti M.
Kotti M.
Maragos P.
Maragos P.
Panagakis Y.
Panagakis Y.
Pitas I.
Pitas I.
Ververidis D.
Ververidis D.
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the crosscorrelation and the magnitude of the corresponding the crosspower spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptrons, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported

Middlesex University Research Repository

Audio-assisted movie dialogue detection

Author: Evangelopoulos G
Kotropoulos C
Kotti M
Maragos P
Panagakis I
Pitas I
Ververidis D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

An audio-assisted system is investigated that detects if a movie scene is a dialogue or not. The system is based on actor indicator functions. That is, functions which define if an actor speaks at a certain time instant. In particular, the cross-correlation and the magnitude of the corresponding the cross-power spectral density of a pair of indicator functions are input to various classifiers, such as voted perceptions, radial basis function networks, random trees, and support vector machines for dialogue/non-dialogue detection. To boost classifier efficiency AdaBoost is also exploited. The aforementioned classifiers are trained using ground truth indicator functions determined by human annotators for 41 dialogue and another 20 non-dialogue audio instances. For testing, actual indicator functions are derived by applying audio activity detection and actor clustering to audio recordings. 23 instances are randomly chosen among the aforementioned 41 dialogue instances, 17 of which correspond to dialogue scenes and 6 to non-dialogue ones. Accuracy ranging between 0.739 and 0.826 is reported. © 2008 IEEE

Crossref

Middlesex University Research Repository

DSpace at NTUA

Spiral - Imperial College Digital Repository

Towards Emotion Recognition: A Persistent Entropy Application

Author: A Geron
A Ortony
A Zomorodian
AS Popova
B Schuller
B Yang
C Cortes
D Ververidis
DM Howard
E Globerson
G Bredon
H Edelsbrunner
J Russell
K Pearson
L Wasserman
M Rucco
N Cristianini
SR Livingstone
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/11/2018
Field of study

Emotion recognition and classification is a very active area of research. In this paper, we present a first approach to emotion classification using persistent entropy and support vector machines. A topology-based model is applied to obtain a single real number from each raw signal. These data are used as input of a support vector machine to classify signals into 8 different emotions (calm, happy, sad, angry, fearful, disgust and surprised)

arXiv.org e-Print Archive

Crossref

Speech Emotion Recognition Considering Local Dynamic Features

Author: BW Schuller
D Ververidis
KS Rao
M Ayadi El
M Hall
M Wollmer
ME Sánchez-Gutiérrez
P Gangamohan
S Johar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/03/2018
Field of study

Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.Comment: 10 pages, 3 figures, accepted by ISSP 201

arXiv.org e-Print Archive

Crossref

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Author: A. Austermann
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Yang
C. Busso
C. M. Lee
C. Nass
D. Bitouk
D. J. C. MacKay
D. Ververidis
D. Ververidis
D. Watson
E. Benetos
E. Benetos
E. Fersini
E. I. Konstantinidis
F. Burkhardt
F. Burkhardt
Fabio Paternò
H. Altun
H. Gunes
H. K. Mishra
H. Mixdorff
H. P. Espinosa
I. Guyon
I. Guyon
I. R. Murray
J. D. Markel
J. Hirschberg
J. Pittermann
K. Dai
K. R. Scherer
L. B. Jackson
M. Ayadi El
M. Kotti
M. Kotti
M. M. Sondhi
M. Pantic
M. Pantic
Margarita Kotti
N. Sato
N. Vanello
P. Boersma
P. Ekman
P. Ekman
P. N. Juslin
P. Ruvolo
P. Zervas
R. A. Calvo
R. Cowie
R. Tato
R. W. Picard
S. Chandaka
S. Ntalampiras
T. Iliou
T. L. Pao
T. P. Kostoulas
T. Vogt
W. Bosma
W. Minker
Z. Inanoglu
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

Crossref

Spiral - Imperial College Digital Repository

DigiArt: towards a virtualization of Cultural Heritage

Author: Abrams G
Bezombes F
Bonjean D
Burton DR
De Groote I
Di Modica K
Lilley F
Nikolopoulos S
Precioso F
Stretcha C
Thomaidou E
Ververidis D
Publication venue: Belgian Science Policy
Publication date
Field of study

DigiArt is a Europe-wide project aimed at providing a new, cost efficient solution to the capture, processing and display of cultural artefacts. The project will change the ways in which the public interact with cultural objects and spaces in a dramatic way. This project is unique in its collaborative approach: cultural heritage professionals working directly with electrical, mechanical, optical and software engineers to develop a solution to current issues faced by the museum sector. The innovations created by the engineers are driven by the demand of the cultural heritage sector. The diversity of the objects and spaces of the three test museums are challenging the engineers to provide a tool useful for a broad variety of indoor and outdoor museums in the future. This goes from using Unmanned Aerial Vehicle (UAVs or drones) to fly and record large sites, to using scanners to record fine jewellery. As a case study, we present here the use-case of Scladina Cave. At the end of the project, the Scladina Cave Archaeological Centre will offer two different visitor experiences. The first uses virtual reality, which will be available anytime, anywhere, to anyone with an internet connected device. The second will use augmented reality technologies within the cave site. The augmented reality visit of the cave will enhance the tour of Scladina by offering visits that would not be possible where it not for the augmented reality, where 3D objects and animations will contribute to offer a new 3D-immersive experience

LJMU Research Online (Liverpool John Moores University)

Tracking the Expression of Annoyance in Call Centers

Author: B Schuller
BM Ben-David
C Ashwin
C Clavel
CN Anagnostopoulos
D Ververidis
F Ringeval
G Paltoglou
JC Kim
JJG Meilán
JM Girard
K Wang
KM Rump
ME Ayadi
P Baranyi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Machine learning researchers have dealt with the identification of emo- tional cues from speech since it is research domain showing a large number of po- tential applications. Many acoustic parameters have been analyzed when searching for cues to identify emotional categories. Then classical classifiers and also out- standing computational approaches have been developed. Experiments have been carried out mainly over induced emotions, even if recently research is shifting to work over spontaneous emotions. In such a framework, it is worth mentioning that the expression of spontaneous emotions depends on cultural factors, on the particu- lar individual and also on the specific situation. In this work, we were interested in the emotional shifts during conversation. In particular we were aimed to track the annoyance shifts appearing in phone conversations to complaint services. To this end we analyzed a set of audio files showing different ways to express annoyance. The call center operators found disappointment, impotence or anger as expression of annoyance. However, our experiments showed that variations of parameters derived from intensity combined with some spectral information and suprasegmental fea- tures are very robust for each speaker and annoyance rate. The work also discussed the annotation problem arising when dealing with human labelling of subjective events. In this work we proposed an extended rating scale in order to include anno- tators disagreements. Our frame classification results validated the chosen annota- tion procedure. Experimental results also showed that shifts in customer annoyance rates could be potentially tracked during phone callsSpanish Mineco under grant TIN2014- 54288-C4-4-R H2020 EU under Empathic RIA action number 769872

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación